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Abstract — Multiple-input multiple-output (MIMO) wireless 
transmission imposes huge challenges on the design of efficient 
hardware architectures for iterative receivers. A major challenge 
is soft-input soft-output (SISO) MIMO demapping, often ap- 
proached by sphere decoding (SD). In this paper, we introduce 
the — to our best knowledge — first VLSI architecture for SISO 
SD applying a single tree-search approach. Compared with a 
soft-output-only base architecture similar to the one proposed by 
Studer et al. in IEEE J-SAC 2008, the architectural modifications 
for soft input still allow a one-node-per-cycle execution. For a 4x4 
16-QAM system, the area increases by 57 % and the operating 
frequency degrades by 34 % only. 

Index Terms — VLSI architecture, Schnorr-Euchner (SE) enu- 
meration, iterative multiple-input multiple-output (MIMO) de- 
coding, soft-input soft-output (SISO) sphere decoding (SD) 



I. Introduction 

Multiple-input multiple-output (MIMO) wireless transmis- 
sions utilizing spatial multiplexing achieve an increased spec- 
tral efficiency compared with single-antenna systems. This im- 
provement comes at the cost of an increased signal-demapping 
complexity, which becomes particularly critical for iterative 
receivers [1]. Recent developments of soft-input soft-output 
(SISO) MIMO-demapping algorithms reduced this complexity 
significantly. Prominent demapping algorithms are k-best and 
list-based approaches [2], [3], Markov chain Monte Carlo 
algorithms (MCMC) [4] and single tree-search (STS) sphere 
decoders (SD) [5]. The STS approach is often preferred since it 
guarantees max-log maximum a posteriori (MAP) optimality. 

Efficient VLSI implementations have been proposed for 
soft-output-only STS SDs [6], [7] exploiting geometric prop- 
erties of QAM constellations. These geometric relations help 
determining a search order, defined as enumeration, leading 
to a fast average tree-search convergence. The SISO STS 
complexity has been prohibitive for VLSI implementations so 
far, because geometric relations are not applicable directly. Re- 
cent improvements of soft-input enumeration strategies moved 
SISO STS SD closer to VLSI architectures [8]. 

Contributions: In this paper, we introduce the — to our best 
knowledge— first VLSI architecture for SISO STS SD. It is 
based on a soft-output-only architecture following the one- 
node-per-cycle (ONPC) paradigm used by [6]. The SISO 
modifications are modular enough to be applied to other 
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existing STS SD architectures and still allow ONPC execu- 
tion. Compared with a soft-output-only architecture, the area 
increases by 57 % and the clock frequency degrades by 34 % 
for a 4 x 4 16-QAM system. Thus, this architecture enables 
STS-based iterative wireless MIMO receivers. 

The paper is organized as follows: Section II sums up the 
basics of SISO STS SD, extended by the soft-input enumer- 
ation strategy in Section III. Section IV describes important 
implementation aspects of the scalable VLSI architecture. In 
Section V the parameter design space of the SISO STS archi- 
tecture as well as area, timing and throughput are discussed. 

II. Single Tree-Search Soft-Input Sphere 
Decoding 

A spatial-multiplexing MIMO scheme with Mr transmit 
and Mr > Mr receive antennas is assumed [1]. Each 
transmit antenna sends one of the 2^ complex elements of 
the symbol set O defined by the modulation alphabet, which 
is assumed to be the same for every antenna. Each vector 
s = [s\, sm t ] T € O AIt results from mapping MjQ bits 
x i.b £ {+1; ~ 1} to an element of Mj , with i being the 
antenna index and b the bit index for one scalar symbol Sj. 

The received symbol vector y £ C Mr is given by 
y = Hs + n, where H <E C MrxAIj is the channel matrix 
and n 6 C Mr is a white circular Gaussian noise vector with 
variance No per element. For tree-search SD, H is typically 
QR-decomposed (QRD) with H = QR, Q g C JWrXMt and 
Q H Q = I and R G C MtxMt being an upper triangular matrix 
[1], [5]. With y = Q H y and n = Q H n, this results in 



y = Rs + n 



(1) 



According to [5], the triangular matrix R in equation (1) 
allows to formulate the SISO max-log MAP MIMO detection 
problem as STS within a 2^-ary complete tree. The tree levels 
correspond to the Mr antennas, each node s, e (!) on tree level 
i is a received symbol candidate, with si being a leaf node. An 
exhaustive search in such a tree leads to a worst-case run-time 
complexity of 0(2® Mt ). As formalized in equations (2) to (4), 
metric increments A4c(si) for channel-based and Ma(si) for 
a priori-based information are summed up to a total increment 
Aip(si). P[sj] is the symbol probability computed from the a 
priori log-likelihood ratios (LLRs) Lf b . 



Ma(s.) 



logPM 



Mr 



1 sr- -y 



M P (si) 



N "" 
Mc{s t ) + M A (si) 



(2) 
(3) 
(4) 
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The sum of metric increments along a path from the root to 
node Si yields the partial metric A4p(s^') for a partial symbol 
vector s w = [s u s Mt ] T - 



Mr 



A4 P (s«) = E^fe) 



(5) 



During a STS, the MAP solution s MAP , its bits Ap and metric 
A MAP = .Mp(s MAP ) and extrinsic counter-hypothesis metrics 
A^ AP are computed by successively improving the current 

metrics A MAP ' cur and A 4 b ' cur . Lf b are extrinsic LLRs with 
s MAP = argmin{X P (s)} 




step k Mp(s [ *>) M,(4 



Mp(0 (2) ) < VW P (0 <3) ) (2) 
Mp(0 (1) ) < A4 P (0 <3) ) (1) 
yWp(0 <4> ) > M r {0 {3) ) o (3) 

M P (0 {4) ) > M P (0 (1) ) skipped 

_M P (0 (4) ) > M P (O i2) ) skipped 
A4p(0 (4 >) = Mp(0 (4) ) <4) 



fcG {1,2,3} 



Fig. 1. Hybrid-enumeration example, k th symbol in SE order: £ O. 
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These metric computations dominate the detection complex- 
ity. For a depth-first tree search, the pruning of sub-trees lying 
outside a hypersphere with a radius not improving / \^ AP ^ CUI = 



A 



-L 



i,b X i,b 



MAP 



provides a heuristic for complexity reduc- 



(i) 



,(|ODl 



tion which is sensitive to the visiting order [s 
A Schnorr-Euchner (SE) order [9] provides a very fast search 
convergence by the following pruning criteria [6], typically 



defining the pruning metrics M. 



down . 



-■Ml 



sibl. 



Mp°Z" > max ■ 
M*£j > max • 



MAP, cur 



A 



\ MAP, cur 
\b 



■ . ■ . . i MAP. cur wj. 

I < J V Xifi T x i,b > Vb 

i < 3 V»»,6 + x^ AP ' cur ,V6| 



Mp(sW): 

} (6) 
(7) 



If inequality (6) holds, the current node and its sub-tree 
are pruned, otherwise a step down is performed in the tree. 
If inequality (7) holds, the enumeration on level j stops, 
otherwise the sibling of the current node is enumerated. The 
arguments of the max operators in (6) and (7) are the sets A 
and B respectively in [6]. We define an examined node (as used 
in [6] and [7]) as a node Sj that has been checked against at 
least one pruning criterion, leading to the complexity measure 
number of examined nodes per detected symbol vector N en . 

If a leaf node with M P (s) > A MAP ' cur is not pruned by 
inequalities (6) or (7), the values {Af ^ P,cm \x itb 

, MAP, cur 



/ MAP,curi 

r x ih l 



'"}■ 

the current leaf becomes 



i,b X i,b 



need to be updated by min {A. h ' cur , A4p(s) — L 
Otherwise, if M P (s) < A MAP '< cur , 
the new M AP solution and the extrinsic counter-hypothesis 

» • r A MAP.curi MAP, old / MAP.curi , . . . 

metrics {A i b ' \x ib ^ x ib ' } are updated by 

• riMAP,cur \MAP,old r A_,MAP,curi 
mln l A i,6 ' A _ L i.b x i,b J- 

Many methods exist to reduce A en , like sorted QRD 
(SQRD) [10] and extrinsic LLR clipping [5]. The latter one 



I rE I 

i,b, clipped I 
, MAP 



< L max , which 



limits the allowed range for b to 
leads to clipped extrinsic metrics A* hcli d . 

ix | \ MAP — L max , min { A MAP + L max , A^} } 

(8) 

Please note that equation (8) is stricter than the min{} function 
used in [5] where a post-processing step is used to guarantee 
I ^clipped I - L ™x for P ro P er channel decoding. In [5], this 



a MAP 

clipped ' 



saves 50 % of the comparisons required for clipping. Experi- 
ments indicate that E[A en ] differs only marginally between the 
two clipping methods. Moreover, radius tightening further re- 
duces A en . A hardware-friendly approximation of A4a(sj) for 
statistically independent symbols, including tightening and still 
guaranteeing max-log-optimal a posteriori LLRs, has been pro- 
posed in [5] (with unipolar bits e^b = | (1 — x i b • sign(L A h ))): 



M A {si) 



•logP s 




di,b = 1 
otherwise 



(9) 



III. The Hybrid-Enumeration Algorithm 



A major issue of SD algorithms is the enumeration process, 
namely the determination of the SE order [s^ , ] on 

a level i with s\ k ^ representing the fc th candidate for node Si, 
in ascending order of Mp. A straightforward implementation 
by computing and fully sorting the set {Mp(sf )} is very 
expensive and inefficient. For the soft-output-only case, the ge- 
ometric properties of the QAM constellation can be exploited 
to avoid full sorting and thus save most of the computations, as 
proposed in [6], [7], [11]. However, in iterative receivers these 
optimizations are not usable directly because the geometry- 
based order is scrambled by the a priori information. A viable 
approach towards efficient soft-input enumeration is given by 
the hybrid-enumeration algorithm presented in [8]. Its basic 
idea is to split the enumeration of {A / (p(s ! - fc ' ) )} into two 
concurrent enumerations of {Mcis^)} and {A'Ia(s^)}. 

On the one hand, the enumeration of {Mcisf* 1 )} is the 
same as in the soft-output-only case, thus allowing to reuse any 
of the related aforementioned efficient methods, even in later 
iterations. On the other hand, the enumeration of {A4a(s^)} 
is efficient as well since the linear sorting of the symbol set O 
needs to be performed independently only once per antenna. 

According to [8], the channel- and a priori-based enumer- 
ations independently select candidate symbols s^j and 
at each step k. The hybrid enumeration simply selects the 
candidate with the lower metric A^p between these two. 

As visualized in Figure 1, the strict SE order is not pre- 
served, hence the inequality A4p(s| fc ' ) ) < A4p(sf' ) ),V/ > k 
does not hold any more. Thus, a modification of the pruning 
criteria is needed to avoid the erroneous exclusion of the MAP 
or counter-hypothesis solutions. For / > k, the inequalities 
Mc(41) < M c (s { c\) and M A {s { *\) < M A (4\) lead to 
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■Mc( s cl) + ^a(saI) — -Mp( s i )> providing an alternative 
lower bound for tree pruning. Thus, in [8] the pruning metric 
of inequality (7) on the current tree level i is re-defined as 

■Kit := Mc(s^) + M A (8$) + -M P (s (m) ) • (10) 

Compared with the SE order, pruning metric (10) preserves 
the error-rate performance at the price of a slight increase 
in 7Y en . For a more detailed description and analysis of the 
hybrid-enumeration algorithm, the reader is referred to [8]. 

IV. A VLSI Architecture for STS Soft-Input 
Sphere Decoding 

In this section, a VLSI architecture for SISO STS SD is 
introduced. It is derived from a soft-output-only depth-first 
STS base architecture extended by soft-input processing. The 
main challenges are discussed that arise from the implementa- 
tion of efficient soft-input extensions according to the hybrid- 
enumeration scheme. Further algorithmic optimizations such 
as LLR correction proposed in [5] are orthogonal to the base 
architecture and can be implemented on top of it. 

A. Soft- Output- Only Base Architecture 

The soft-output-only base STS architecture, composed of 
the light gray blocks in Figure 2, follows the ONPC execu- 
tion principle used by Studer et al. in [6]. Its architectural 
structure is derived from the observation that the tree search 
is composed of three basic control-flow steps: 

i) Vertical steps (®) down from tree level i to i— 1 enumerate 

(1) (k) 

the first child node s i _ 1 of a parent node . This requires 
a quantization step Q to find the QAM symbol next to y~i, 
followed by the computation of M.p(s^_ 1 ). The result of Q is 
used to initialize the enumeration on the tree level i — 1 and 
by the pruning-criteria check for s[_ 1 . 

ii) Horizontal steps (®) on a tree level i enumerate the node 
s ( fe +!) a f( er enumerating the node s| fc ' and its sub-tree. This 
category also includes steps back from a child node s,_i to 
the next sibling s^ k+1 ' of its parent node s\ k \ 

(k) 

Hi) Pruning-criteria checks (®) for a node s\ determine if 
either a vertical step to the child s^\, a horizontal step to the 
sibling s^ +1 ^ or a horizontal step to its parent's sibling s^ 1 ^ 
has to be performed next. The M.p history (©) unit stores the 
partial metrics A4p(s^), recursively implements equation (5) 
and provides its result to unit © for pruning and LLR clipping 
by equation (8). 

In a depth-first SD, the tree-traversal control flow exhibits 
severe data and control dependencies. In order to achieve a 
throughput of one examined node per cycle, the base archi- 

(k) 

tecture executes the pruning check for node s) concurrently 

fll (k+l) 

with the steps towards s i _ 1 and s\ in cycle n. If the 
pruning check selects s|i\, s\ k+1 ' is saved in a preferred- 
siblings cache (©) for later use during a step up in the tree. 
Thus, in cycle n + 1 the availability of a valid node for the 
next pruning check is guaranteed. 

The enumeration unit of the base architecture employs the 
column-wise zig-zag enumeration strategy (©) presented in 
[11]. Compared with circular PSK-like enumeration [6], the 




Mp History © 



© Pruning Criteria, {Af b AP },A MAP ,{xf b AP } J Qft 

.!/•::,! 

Fig. 2. Block diagram of the proposed soft-input STS SD VLSI architecture. 
Units added/modified for soft-input are emphasized by dark gray background. 
Legend: Mapper M, Demapper T>, Quantizer Q. 

column-wise enumeration allows a much more regular hard- 
ware implementation. Furthermore, for 64 QAM and higher 
modulation orders it requires less comparisons. 

Since there is no assumption on the mapping between QAM 
symbols and bits, two run-time-programmable lookup tables, 
named mapper Ai and demapper V respectively, are used for 
the conversion between the symbol and the bit representations. 

B. Soft-Input Extensions 

In order to extend the base architecture presented in Sec- 
tion IV-A, mainly extra units for the a priori-based enumera- 
tion have to be added, along with slight changes in the column- 
wise zig-zag implementation. These extensions correspond to 
the dark gray units in Figure 2. 

1) Enumerated-nodes flags: Both channel- and a priori- 
based enumeration units have to skip nodes that have already 
been enumerated, because the local enumeration orders for 
Aic and .Ma differ from the global enumeration order. 
Therefore, both units need the list of enumerated nodes to 
guarantee that each node is enumerated only once. This flag 
vector of 2® bits per antenna is maintained in unit ®. 

2) Modified column-wise zig-zag enumeration: Skipping 
an arbitrary number of nodes implies modifications to the 
column-wise zig-zag implementation (©). Compared with the 
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base architecture, the new column-enumeration unit does not 
keep internal zig-zag states any more. Instead, each column 
enumeration performs a minimum search over the linear 
distances between the quantized imaginary part <2(Im{?7i — 
S_y=i+i Ri.j s j}) an d a H rows {Im{sj|s, <E 0}} masked by the 
enumerated-nodes flags. The hardware complexity increases 
only moderately, because distance computations are the same 
for all columns and operate on words of only Q/2+1 bits. 

3) A priori-based enumeration: With di being the decimal 
representation of the bit vector di 1], a mapping of di 

to the corresponding symbol Sj(di), M A (di) = -M A (si(di)) 
and an order defined by Si(d. 



(*0> 



s Ai , one problem of 



enumerating {M A } t = {M A (di)\0 < d. t < 2 Q } is the lack of 
relations among a priori LLRs. Thus, the only known solution 
is the full computation and sorting of {.Ma};- 

First, the computation of {.Ma}; (®) requires 2^ — Q — 1 
additions per antenna and received vector. Due to the ONPC 
principle and the structure of (9), the number of hardware 
adders can be reduced by resource sharing. The first enumer- 
ation step always results in d^ = and A4 A [d!f^) = 0, 
thus the subset {M A }ih = {M A (di)\l < d t < 2 Q ~ 1 } can 
be computed concurrently. In the second step, A4 A (d\ ) = 
min V b | iff, | can be enumerated since 

while the subset {M A }i,H = {-M A (di)l 2Q ~ 1 < d { < 2 Q } can 
be computed. This approach only requires 2®~ 1 — 1 adders 
independently from M T , yielding adder savings of 36 % for 
16 QAM and 45% for 64 QAM. Furthermore, for an ONPC 
architecture, no latency is added since the subsets {A^a}i,l 
and {A^a};,h can be computed during the enumeration of s Ai 

(2) 

and s A \ . Further resource sharing would result in limited gains 
while significantly increasing irregularity. 

The second issue is sorting {Ai A }i- Since latency is typi- 
cally a serious issue for run-time constrained depth-first SD, 
an approach has been chosen that does not add latency for the 
sorting of {M A }i- The ONPC principle allows a minimum 
search (®) for .MA.min over the set {Ai A }i for the enumeration 
of the current antenna i, masked by the enumerated-nodes 
flags. The resulting binary tree of compare-select (CS) units 
would dominate the critical path already for 16 QAM. 

However, the properties of equation (9) can be ex- 
ploited to remove almost all comparators and CS depen- 
dencies for the first three CS levels. The principle can be 
explained easily by considering the removal of the first 
level: for pairs of {A4 A (s[ k ^), M^sf" 1 )} with only one bit 



} the larger metric A4 A (s 



({M})< 



is the one 



with x^i' 1 " 7^ sign(Lf & ). This kind of decision does not 
need any metric comparison but can be determined by single- 
bit comparisons of sign bits and enumerated-nodes flags. 
Selecting the minimum of 4-tuples (first two CS tree levels) 



8:1 multiplexers are required for 16 QAM and only seven CS 
units and eight 8:1 multiplexers for 64 QAM. Compared with a 
full CS tree, the comparator savings are 53 % in total and 50 % 
in the critical path for 16 QAM and 79% in total and 33% 
in the critical path for 64 QAM. Extensions to higher orders 
than 8-tuples are possible but would result in an exponential 
complexity increase. 

4) Pruning- criteria checks: In [6], the checks of the prun- 
ing criteria of equations (6) and (7) have been simplified to 
a single pruning-criterion check of equation (7) in order to 
reduce hardware complexity, at the cost of a slight increase of 
A en . For the SISO STS SD architecture proposed in this paper, 
the implementation of two different pruning criteria in unit ® 
is mandatory to prevent a further significant increase of A en . 
In order to avoid extra delays on the critical path, the pruning- 
criteria checks are not implemented as maximum searches but 
as pairs of Af T 2 Q fully parallel comparators > A^p 

and MjJJ - . > A^ p , followed by simple bit-masking and 
combining. 

V. ASIC Synthesis Results 

The architecture presented in the previous section has been 
implemented in VHDL including parameters for word lengths, 
Mr, QAM order and a switch to enable/disable soft-input 
support. A representative set of parameter combinations has 
been instantiated by layout-aware gate-level synthesis 1 . 

Since both the soft-output-only base architecture and the 
SISO architecture follow the ONPC principle, their throughput 
can be determined by 

rQM T 



9 



E[A e: 



r/dk [bit/s] 



(11) 



with r being the code rate and E[A en ] being the average N en . 
The curves for the iterative O and the cumulative E[A en ] for 
a 4 x 4 16-QAM MIMO system 2 achieving a frame error rate 
(FER) of 1 % are given in Figure 3, including as a reference 
the cumulative E[JV en ] obtained by SE ordering and floating- 
point operations. In the 4 th iteration the hybrid-enumeration 
algorithm introduces an overhead of less than 28 % in terms of 
E[A en ]. The least-effort throughput in Figure 3 is derived from 
equation (11) by selecting the minimum cumulative E[A en ] 
among all iterations for a specific SNR. The intersections of 
the cumulative E[7Y en ] curves determine the SNR points for 
changing the number of iterations. In Figure 3 the switching 
points are marked by ® (1 <=± 2 iterations), by ® (2 3 
iterations) and by © (3 4 iterations). 

Area and delay of this architecture are quite sensitive to the 
fixed-point word lengths. Therefore, the word lengths have 



differing in only two bits {b{ mt n}\x\^ 
requires an additional comparison \Lf b ^ \L 



i.b 



However, 



this extra comparison is the same for all 4-tuple sub-trees and 
does not depend on intermediate results generated in the CS 
tree. Therefore, the critical path is significantly reduced. The 
extension to 8-tuples (first three CS tree levels) has a total of 
only six parallel comparators. Thus, only one CS unit and two 



UMC 90 nm standard-performance CMOS library, typical case, Synopsys 
n\ Design Compiler 2009.06-spl in topographical mode. 

7^ x i,b tm „}} 2 Throughout this paper we use a system with an i.i.d. Rayleigh fading 
channel, perfect channel knowledge and SQRD [10]. The BICM transmission 
is set up with a convolutional channel code (rate 1/2, generator polynomials 
[133 OI 171 ], constraint length 7) decoded by a max-log BCJR channel 
decoder with perfect termination knowledge and an S -random interleaver cor- 
responding to 512 information bits. The SNR is defined as SNR = MiE s /Nq, 
with E s = E[\s\ 2 ],s e O. P[sj] is approximated by equation (9). The VLSI 
architecture internally operates on normalized metrics .Mnorm. = NqM to 



avoid division by No, normalized clipping levels are given by NqL„ 
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Fig. 3. Cumulative E[iV C n] and iterative least-effort throughput © over mini- 
mum SNR for 1 % FER for the 4x4 16-QAM architecture. Numbers annotated 
to cumulative E[JV en ] curves are normalized clipping levels NqL^„. As in 
[5], one iteration is defined as one use of the SISO MIMO demapper and the 
SISO channel decoder (1 st iteration corresponds to soft-output-only SD). 
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Fig. 4. Parametrization design space of the proposed STS SD architecture. 
Area is measured in gate equivalents (GEs). One GE corresponds to the area 
of a two-input drive-one NAND gate. 



been carefully selected to make the FER-performance loss 
negligible with respect to floating-point operation 3 . 

Figure 4 shows the synthesis results for representative 
parameter sets. The results for the soft-output-only case are 
comparable to the implementation published in [6]. Since the 
two base architectures are similar, they are close in terms of 
area. The timing differs, mainly for two reasons. First, Figure 4 
shows pre-layout synthesis results for a 90 nm technology 
whereas those in [6] are post-layout results for a 250 nm 
technology scaled to 90 nm by / 90 ~ ^jf/aso- Second, the 
architectures differ in their pipeline and enumeration schemes. 

By enabling soft-input processing for the 4x4 16-QAM 
reference, the area increases by 57% from 61kGates to 
96 kGates, while the clock frequency degrades by 34 % from 



3 Word lengths [integer.fractional] for 4 X 4 16 QAM: ft [6.7], flj,j[4.7], 
Lf b [9.5], Lf b [9.5\, A4 {c ,a,p}[9-6], A QAM-order increase of factor 4 
requires one more integer bit for tji per real/imaginary part and two more 
integer bits for .M{c,A,P}. L-f j and L^ h . Doubling Mi requires one more 
integer bit for X{ C ,a,p}. L f b and L E i b . 



379 MHz to 250 MHz. We can conclude that the additional 
cost for soft-input is affordable at the prospect of working at 

1 lower SNR regimes with iterative systems. 

c The proposed architecture scales almost linearly with Mr 
'1 in terms of area. The critical path degrades only by less than 
g 10% when doubling Mr. When increasing the QAM order 
by a factor of 4 in the soft-input case, the area is less than 

CD 

2 doubled while the frequency degrades by less than 20-25 %, 
® despite the enumeration being significantly affected. 

3 

J=" VI. CONCLUSION 

So 

S To our best knowledge, we introduced the first SISO STS 
H SD architecture, enabling iterative STS SD-based receivers. 
The parametrized architecture offers very good scalability over 
Mr and the QAM order. The approximate hybrid-enumeration 
method enables the implementation of iterative STS-based 
MIMO receivers, although high data-rate communication sys- 
tems may require multiple parallel SD instances to meet the 
throughput constraints. We believe that the algorithms and 
hardware-design principles presented in this paper are suitable 
for most kinds of SD architectures. Our future development 
will focus on further enhancements of the architecture, based 
for instance on the ideas proposed in [6]. 
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